Hide
Раскрыть

ISSN 2587-814X (print),
ISSN 2587-8158 (online)

Russian version: ISSN 1998-0663 (print),
ISSN 2587-8166 (online)

Alla Vladova1,2, Elena Shek3
  • 1 V.A. Trapeznikov Institute of Control Sciences, Russian Academy of Sciences, 65, Profsoyuznaya Street, Moscow 117997, Russia
  • 2 Financial University under the Government of the Russian Federation , 49, Leningradsky Prospect, Moscow 125993, Russia
  • 3 Plekhanov Russian University of Economics, 36, Stremyanny Lane, Moscow 117997, Russia

Data preprocessing for machine analysis of sales representatives’ key performance indicators

2021. No. 3 Vol.15. P. 48–59 [issue contents]

      Significant transformation of the operational activity of product and service distributors is driven by changes in data-receiving and processing technology. At present, the work of these companies’ representatives is digitized to a large extent: for example, the road time, the number and places of meetings with customers are automatically recorded. At the same time, the productivity of managers who do not make direct sales is usually evaluated with the help of surveys, experts and costly double visits, although the existence of large data samples makes possible the use of statistical analysis to identify both insufficient and inflated values of performance indicators. Source data: a relational database that accumulates information about 28 categorical, quantitative, geolocation and temporal parameters of sale representatives’ activities for the last year. Based on available data, we created synthetic features (the latitude and longitude features produced the index, region, street, and house features; based upon identifiers we calculated the sum of activities of sales representatives; according to temporary features we defined the season of the year, the day of the week and the period of day features). The methodology for statistical analysis consists of three main stages: collection and processing of primary data; summary and grouping processed information; setting statistical hypotheses and interpreting the results. A probabilistic approach was used to model the level of distortion of sale representatives’ activities. As a result, with the built tag cloud we highlighted: the most popular season for advertising campaigns; the most productive departments and sale representatives; days of the week with the largest number of contacts to customers. We established a significant number of records about meetings with clients at the weekends. As a result of the data mining, we made a statistical hypothesis about the possibility of identifying the sale representatives who distort the number and parameters of meetings. A set of synthetic integer, real and categorical features was created to identify hidden relationships. Doubtful data (such as working at weekends or at night) were revealed. The resulting aggregated dataset is grouped by a sale representative’s activity ID and the distribution of this feature is plotted. For each sale representative, integer and real features are summarized and outliers that characterize inefficient performance or distortion of data have been detected. Thus, the presence of a large sample of data on the history of movements and activities allowed us to evaluate the productivity of the distribution company’s sales representatives based upon indirect features.

Citation: Vladova A.Yu., Shek E.D. (2021) Data preprocessing for machine analysis of sales representatives’ key performance indicators. Business Informatics , vol. 15, no 3, pp. 48–59. DOI: 10.17323/2587-814X.2021.3.48.59
BiBTeX
RIS
 
 
Rambler's Top100 rss